Skip to content

refactor(sandbox): replace iptables with nftables for network policy enforcement#1401

Open
russellb wants to merge 1 commit into
NVIDIA:mainfrom
russellb:refactor/nftables-migration
Open

refactor(sandbox): replace iptables with nftables for network policy enforcement#1401
russellb wants to merge 1 commit into
NVIDIA:mainfrom
russellb:refactor/nftables-migration

Conversation

@russellb
Copy link
Copy Markdown
Contributor

@russellb russellb commented May 15, 2026

Summary

Migrate all sandbox and VM driver network policy enforcement from iptables to nftables. nftables provides atomic ruleset loading, a cleaner rule syntax, and is the standard netfilter interface in modern kernels.

Closes #1335

Changes

Sandbox bypass enforcement (openshell-sandbox):

  • Replace iptables chain of individual rule insertions with a single atomic nftables ruleset load via nft -f
  • New nft_ruleset module with pure functions for ruleset generation and unit tests
  • Combine log and reject rules in one inet family table (handles both IPv4 and IPv6 in a single ruleset)
  • Fall back to reject-only ruleset when kernel lacks nft_log support
  • Enable net.netfilter.nf_log_all_netns so log rules work from non-init network namespaces
  • Use temp file for nft ruleset loading instead of stdin for compatibility with minimal VM guest environments
  • Keep tempfile in both Linux-only runtime deps and [dev-dependencies] so non-Linux test builds resolve it

VM TAP networking (openshell-driver-vm):

  • Replace iptables NAT/forwarding rules with nftables equivalents
  • New nft_ruleset module for TAP network rule generation with unit tests
  • Atomic table-per-TAP-device lifecycle (create/destroy)
  • Host-side rules provide NAT infrastructure and defense-in-depth isolation: forward chain blocks unsolicited inbound to the VM, input chain restricts VM-to-host traffic to the gateway port only
  • Primary security enforcement (proxy-only egress, bypass detection) happens inside the VM guest via the sandbox supervisor's own nftables rules

VM init script:

  • Load nft kernel modules at sandbox init
  • Enable nf_log_all_netns sysctl so bypass detection LOG rules work from non-init network namespaces inside the guest

OCSF / docs:

  • Update firewall rule engine references from iptables to nftables
  • Update docs/security/best-practices.mdx and BYOC Dockerfile to reference nft
  • Document host-side nftables rules, host firewall interaction model, and two-layer enforcement architecture in VM driver README
  • Add Host Firewall section to docs/reference/sandbox-compute-drivers.mdx

Testing

  • mise run pre-commit passes
  • Unit tests for both nft_ruleset modules cover ruleset generation with and without log rules
  • Bypass detection e2e test (e2e/rust/tests/bypass_detection.rs) run successfully on Linux host with gateway:vm driver
  • Bypass detection e2e test also verified on macOS with Podman driver

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

Label test:e2e applied, but pull-request/1401 is at {"messa while the PR head is 7ed0df7. A maintainer needs to comment /ok to test 7ed0df79a91d6ac036fbc79d2744be59dee1ff08 to refresh the mirror. Once the mirror catches up, re-run Branch E2E Checks from the Actions tab.

@johntmyers
Copy link
Copy Markdown
Collaborator

/ok to test 7ed0df7

Copy link
Copy Markdown
Collaborator

@johntmyers johntmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review feedback:

  • [P1] crates/openshell-sandbox/Cargo.toml:85 moves tempfile from [dev-dependencies] into Linux-only dependencies, but many unconditional test modules still use tempfile, for example crates/openshell-sandbox/src/identity.rs:178. Non-Linux test builds will no longer resolve tempfile. Please keep the Linux runtime dependency for the new temp-file ruleset loader, but add tempfile = "3" back under [dev-dependencies].

  • [P1] crates/openshell-driver-vm/src/nft_ruleset.rs:31 installs standalone nftables forward and input base chains with accept rules. That is not equivalent to appending accepts into the existing iptables FORWARD/INPUT chains: an accept verdict in one nftables base chain does not guarantee later base chains on the same hook/firewall policy will not still drop the packet. Hosts with default-drop firewall posture can lose VM guest connectivity to the gateway or outbound network. Please preserve the old allow behavior in the effective host filter path, or explicitly preflight/document the required host firewall posture.

@russellb
Copy link
Copy Markdown
Contributor Author

Code review feedback:

  • [P1] crates/openshell-sandbox/Cargo.toml:85 moves tempfile from [dev-dependencies] into Linux-only dependencies, but many unconditional test modules still use tempfile, for example crates/openshell-sandbox/src/identity.rs:178. Non-Linux test builds will no longer resolve tempfile. Please keep the Linux runtime dependency for the new temp-file ruleset loader, but add tempfile = "3" back under [dev-dependencies].

Oops. I made this change on linux and didn't test the branch again after that on mac. That would have caught it. Easy fix, at least.

  • [P1] crates/openshell-driver-vm/src/nft_ruleset.rs:31 installs standalone nftables forward and input base chains with accept rules. That is not equivalent to appending accepts into the existing iptables FORWARD/INPUT chains: an accept verdict in one nftables base chain does not guarantee later base chains on the same hook/firewall policy will not still drop the packet. Hosts with default-drop firewall posture can lose VM guest connectivity to the gateway or outbound network. Please preserve the old allow behavior in the effective host filter path, or explicitly preflight/document the required host firewall posture.

This one points out a fundamental difference in how rules are structured between nftables and iptables. I need to think about this one, but it definitely must be addressed before proceeding.

…enforcement

Migrate all sandbox and VM driver network policy enforcement from
iptables to nftables. nftables provides atomic ruleset loading, a
cleaner rule syntax, and is the standard netfilter interface in modern
kernels.

Sandbox bypass enforcement (openshell-sandbox):
- Replace iptables chain of individual rule insertions with a single
  atomic nftables ruleset load via nft -f
- New nft_ruleset module with pure functions for ruleset generation
  and unit tests
- Combine log and reject rules in one inet family table (handles both
  IPv4 and IPv6 in a single ruleset)
- Fall back to reject-only ruleset when kernel lacks nft_log support
- Enable net.netfilter.nf_log_all_netns so log rules work from
  non-init network namespaces
- Use temp file for nft ruleset loading instead of stdin for
  compatibility with minimal VM guest environments

VM TAP networking (openshell-driver-vm):
- Replace iptables NAT/forwarding rules with nftables equivalents
- New nft_ruleset module for TAP network rule generation with unit
  tests
- Atomic table-per-TAP-device lifecycle (create/destroy)
- Host-side rules provide NAT infrastructure and defense-in-depth
  isolation (input chain restricts VM to gateway port only, forward
  chain blocks unsolicited inbound); primary security enforcement
  happens inside the VM guest via the sandbox supervisor's own rules

VM init script:
- Load nft kernel modules at sandbox init
- Enable nf_log_all_netns sysctl for bypass detection logging

OCSF / docs:
- Update firewall rule engine references from iptables to nftables
- Document host firewall interaction model and two-layer enforcement
  architecture in VM driver README and compute drivers reference

Closes NVIDIA#1335

Signed-off-by: Russell Bryant <rbryant@redhat.com>
@russellb russellb force-pushed the refactor/nftables-migration branch from 4acf125 to 158054f Compare May 15, 2026 20:43
@russellb
Copy link
Copy Markdown
Contributor Author

Re: [P1] nftables forward/input chains and host firewall interaction

Good catch on the iptables-vs-nftables semantic difference. Investigated this and landed on a defense-in-depth approach rather than trying to replicate the iptables accept-into-shared-chain behavior.

What changed:

The host-side VM driver rules serve two purposes: NAT infrastructure (required) and defense-in-depth host isolation (hardening). Primary security enforcement — proxy-only egress and bypass detection — is handled by the sandbox supervisor's own nftables rules inside the VM guest, in a network namespace. The host-side rules are not the security boundary.

Given that, instead of trying to preserve the old iptables allow-passthrough semantics (which nftables can't exactly replicate across independent base chains), I added explicit drop rules at the end of the forward and input chains to make them whitelist-only:

  • forward: accepts outbound from TAP (guest-side enforcement handles the filtering), accepts established/related inbound, drops unsolicited inbound to the VM
  • input: accepts VM traffic to the gateway port, drops everything else from the VM to the host
  • postrouting: NAT masquerade (unchanged)

The policy accept on each chain means non-TAP traffic passes through untouched. On restrictive hosts, their chains can still drop TAP traffic our chains accept — a drop from any chain is final in nftables, and we can't override that. Documented this interaction model in the VM driver README and added a Host Firewall section to the compute drivers reference.

@russellb russellb requested a review from johntmyers May 15, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: replace iptables with nftables for sandbox and VM networking

2 participants